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Conventional  ability  tests  require  all  testees  to  answer  the  same  set  of 
test  items.  Because  testees  differ  in  ability  level,  however,  tests  of  this 
kind  may  potentially  create  differential  psychological  environments  for  testees 
of  different  ability  levels.  A test  which  is  appropriately  difficult  for  a 
testee  of  average  ability  may  be  perceived  by  less  able  individuals  as  being 
much  too  difficult,  and  such  perceptions  may  lead  these  testees  to  approach  the 
task  with  anxiety  and  forbearance.  On  the  other  hand,  individuals  with  higher 
than  average  abilities  may  find  the  task  a simple  or  even  pleasant  one. 

Clearly,  the  psychological  environment  of  a testee  may  vary  greatly  depending 
on  the  individual's  perception  of  the  task. 

Adaptive  tests  are  designed  such  that  each  testee  receives  items  which  are 
psychometrically  appropriate  for  his/her  ability  level  (Lord,  1970;  Weiss,  1974; 
Weiss  & Betz,  1973).  For  example,  items  in  such  tests  may  be  chosen  so  that 
each  testee,  regardless  of  ability  level,  will  have  approximately  a fifty-percent 
chance  of  answering  the  item  correctly  (e.g..  Lord,  1970).  The  adaptive  test 
may  thus  reduce  the  differential  psychological  environments  arising  from  the 
administration  of  a fixed  set  of  items  to  persons  of  differing  ability  levels, 
and  may  thereby  improve  the  performance  of  low-ability  students.  In  fact, 
under  certain  conditions,  adaptive  testing  has  been  shown  to  be  more  motivating 
for  low-ability  testees  (Betz  & Weiss,  1976.b)  and  to  result  in  higher  ability 
estimates  (Betz  & Weiss,  1976/). 

Holtzman  (1970)  points  out  the  potential  importance  of  psychological  factors 
in  the  estimation  of  an  Individual's  ability: 

It  may  be  important  to  investigate  the  interaction  of  personality 
and  situational  factors  with  tailored  testing.  The  motivational  impact 
on  the  student  when  he  discovers  that  most  of  the  items  are  at  a certain 
level  of  difficulty  (or  uncertainty)  is  unknown.  The  optimal  level 
(or  mixture  of  levels)  for  a given  student  will  not  be  derived  from  test 
theory  alone;  information  about  student  anxiety  and  motivation  may  also 
be  relevant,  (p.  199). 

Whether  adaptive  tests  can  actually  reduce  the  differential  psychological 
effects  due  to  the  administration  of  an  inappropriately  easy  or  difficult  set 
of  test  items  depends  largely  on  whether  testees  can  accurately  perceive  the 
difficulties  of  the  items  administered.  Little  research  has  dealt  directly 
with  the  question  of  item-difficulty  perception. 

Munz  and  Jacobs  (1971)  asked  introductory  psychology  students  to  scale 
multiple-choice  examination  questions  on  the  subjective  difficulty  an  introduc- 
tory psychology  student  would  experience  in  reaching  a solution  to  a particular 
test  question.  Thurstone's  methods  of  equal-appearing  intervals  was  used  to 
derive  difficulty  scale  values  for  the  individual  items.  These  scale  values 
correlated  positively  but  moderately  (,*'=.52)  with  traditional  proportion-correct 
difficulty  Indices  based  on  the  subsequent  administration  of  those  items  to 


other  introductory  psychology  students.  However,  Munz  and  Jacobs  made  no 
attempt  to  determine  the  accuracy  with  which  individuals  perceived  item  diffi- 
culties velative  to  their  own  levels  of  ability.  Further,  these  results  may 
be  generalized  only  to  other  achievement-testing  situations  where  students  have 
been  exposed  to  the  material  and  have  made  an  attempt  to  familiarize  themselves 
with  it. 

Bratfisch,  Domic,  and  Borg  (1972)  asked  individuals  to  estimate  the 
subjective  difficulty  of  items  from  sets  A,  B,  D,  and  E of  Raven's  Etandavrl 
Pi‘Oijrejsive  Mat'nees.  The  items  were  first  administered  conventionally,  in  the 
order  of  their  "objective"  difficulty  as  assessed  by  deteirminlng  the  proportion 
of  correct  responses  in  a norming  sample.  Following  this,  the  items  were 
presented  in  random  order  and  estimates  of  their  subjective  difficulties  were 
obtained  through  a magnitude  estimation  procedure.  The  Spearman  rank-order 
correlation  between  the  subjective  difficulties  of  the  items  and  the  order  of 
their  Initial  administration  (i.e.,  their  ranked  "objective"  difficulty)  was 
positive  and  high  (p^=.90).  Unfortunately,  the  effect  of  the  items'  prior 

administration  in  the  order  of  their  objective  difficulty  cannot  be  determined. 

In  another  study  by  the  same  authors  (Bratfisch,  Borg  & Domic,  1972), 
testees  were  administered  numerical-reasoning,  spatial-ability,  or  verbal- 
comprehension  items  in  the  order  of  "objective"  difficulty  of  the  items  in  the 
tests.  Immediately  after  attempting  to  answer  each  item  in  the  conventional 
manner,  the  testees  rated  the  item's  difficulty  on  a nine-point  scale  where 
1 corresponded  to  a "very,  very  easy"  item  and  9 corresponded  to  a "very,  very 
hard"  item.  The  Spearman  correlations  between  order  of  administration  and 
perceived  difficulty  for  the  numerical-reasoning,  spatial-ability,  and  verbal- 
comprehension  tests  were  .97,  .92,  and  .92,  respectively.  Unfortunately,  in 
both  studies  by  these  authors,  the  subjective  difficulties  were  not  explicitly 
related  to  the  testees'  perceptions  of  an  item's  appropriateness  to  their 
ability  levels.  More  importantly,  in  both  studies,  it  is  impossible  to  separate 
the  effect  of  item  difficulty  from  that  of  order  of  administration. 

The  present  study  was  designed  to  determine  whether  or  not  testees  can 
perceive  the  difficulties  of  ability  test  items  relative  to  their  levels  of 
ability  and,  if  so,  to  investigate  the  accuracy  of  these  perceptions  for 
individual  items.  Additionally,  the  study  was  designed  to  determine  the  level 
of  item  difficulty  perceived  by  testees  as  being  appropriate  for  their  ability. 


Method 


Test  Construction 


Two  41-item  conventional  tests  were  designed  v;hich  liad  a large  range  of 
differences  between  the  difficulties  of  successive  items.  Items  for  the  tests 
were  chosen  from  a pool  of  five-alternative,  multiple-choice  vocabulary  items 
on  the  basis  of  their  normal-ogive  difficulty  (fc)  and  discrimination  (a) 
parameters  (Lord  & Novlck,  1968).  One  of  the  tests  was  designed  to  be  adminis- 
tered to  a group  of  relatively  low-ability  college  students.  The  other  test 
was  designed  to  be  administered  to  a group  of  relatively  higher  ability  students. 
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The  item  parameter  estimates  were  based  initially  on  data  reported  by 
McBride  and  Weiss  (1974),  derived  from  samples  of  University  of  Minnesota 
undergraduates.  These  parameter  estimates  were  revised  using  a prpcedure 
essentially  the  same  as  that  described  by  Jensema  (1976) . Appendix  A 
describes  the  process  of  developing  the  revised  item  parameters.  The  difficulty 
and  discrimination  parameters  for  each  test  item  are  shown  in  Appendix  Table  B-1. 

_ The  low-  and  high-ability  tests  had  a mean  difficulty  of  fc=-2.190  and 
7>=-.488,  respectively.  Mean  discrimination  values  for  the  low-  and  hlgh-abllity 
tests  were  a'=1.117  and  a=1.501,  respectively. 

Procedure 


Subjects . Two  groups  of  undergraduate  students  participated  in  this  study. 
The  first  group  consisted  of  119  students  from  psychology  classes  in  the 
University  of  Minnesota's  General  College  (GC)  who  were  tested  in  the  winter 
of  1975.  The  second  group,  tested  in  the  spring  of  1975,  consisted  of  185 
students  from  an  introductory  psychology  class  in  the  University's  College  of 
Liberal  Arts  (CLA) . All  students  were  volunteers  who  received  points  toward 
their  final  course  grades  for  participation  in  the  experiment.  GC  students 
typically  perform  more  poorly  on  ability  and  aptitude  tests  than  do  CLA 
students;  for  the  purposes  of  this  study,  the  GC  students  will  therefore  be 
designated  as  the  "low-ability"  group  while  the  CLA  students  will  be  referred 
to  as  the  "high-ability"  group. 

Test  administration.  All  students  were  tested  at  individual  cathode-ray 
terminals  (CRTs)  connected  to  a Hewlett-Packard  9600E  real-time  computer  system. 
Instructional  screens  similar  to  those  described  by  DeWitt  and  Weiss  (1974, 
pp.  36-53)  explained  the  operation  of  the  CRTs  before  the  actual  testing  was 
begun.  In  addition,  a proctor  was  present  in  the  testing  room  to  provide 
assistance  in  the  operation  of  the  equipment. 

Each  student  answered  41  multiple-choice  vocabulary  test  items.  The 
first  six  test  items  presented  were  identical  for  testees  in  a given  ability 
group.  These  items,  whose  difficulties  reflected  the  difficulty  range  of  the 
test,  served  to  familiarize  the  students  with  the  range  of  difficulties  they 
would  subsequently  encounter.  The  remaining  35  items  in  each  test  were  presented 
in  four  different  orders  of  administration  to  minimize  the  effect  that  the  order 
of  item  presentation  might  have  on  perceived  item  difficulty.  Testees  were 
sequentially  assigned  to  one  of  the  four  conditions.  Although  the  same 
procedure  was  followed  in  both  ability  groups,  the  items  differed  between 
groups.  Appendix  Table  B-1  shows  the  order  of  item  administration  in  each  of 
the  four  conditions  for  each  ability  group. 

Prior  to  the  administration  of  the  test,  the  students  were  informed  that 
they  would  have  as  much  time  as  they  needed  to  complete  the  task.  During  the 
test,  items  were  presented  on  the  CRT  screen  and  students  responded  by  typing 
the  number  corresponding  to  the  chosen  alternative  for  each  five-alternative 
multiple-choice  item.  Immediately  after  responding  to  an  item,  each  student 
was  asked  to  indicate  the  Item's  perceived  difficulty  by  entering  a difficulty 
code  selected  from  the  following  list: 
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A.  Much  too  easy  for  you 

B.  Somewhat  too  easy  for  you 

C.  Just  about  right  for  you 
P.  Somewhat  too  hard  for  you 
E.  Much  too  hard  for  you. 

The  testee's  response  was  then  checked  by  the  computer  to  ensure  that  one  of 
the  five  alternatives  had  been  chosen,  and  these  data  were  stored  with  the 
item-response  data  for  later  analysis. 

Design 

The  study  was  designed  to  investigate  three  different  aspects  of  item- 
difficulty  perception.  The  initial  phase  was  designed  to  determine  whether  or 
not  testees  could  accurately  perceive  the  difficulty  of  ability-test  items. 

The  second  phase  was  concerned  with  whether  or  not  a testee's  ability  level 
was  related  to  the  perception  of  the  relative  difficulty  of  a given  item; 
that  is,  how  accurate  an  individual's  perceptions  were,  relative  to  his/her 
ability  level.  The  third  phase  of  the  analysis  attempted  to  determine  the 
relative  item  difficulty  which  was  perceived  by  the  testee  as  being  about 
right  for  his/her  ability  level. 


Accuracy  of  Difficulty  Perceptions 


Method  of  Analysis 


Difficulty  perception  model}  An  individual's  perception  of  an  item's 
difficulty  can  be  thought  of  as  the  signed  distance  between  the  person's 
ability  level  and  the  Item's  difficulty  level  in  a Euclidean  ability/difficulty 
space.  This  perception  will  be  denoted  by 

P 

d ■ 1 w . {X  . -X . ) [1] 

JP  OP  ^P 

where  d.  . is  the  perceived  difficulty  of  item  j for  person  i 

X.  is  the  difficulty  of  item  j along  ability/difficulty  dimension  p 

JP 

X.  is  the  ability  of  person  i along  ability/difficulty  dimension  p 

tp 

W is  the  weight  of  item  j along  dimension  p 

P is  the  number  of  dimensions  in  the  ability/difficulty  space. 

Thus,  in  this  model,  the  difficulty  of  an  item  for  a given  person  is  defined 
as  the  weighted  sum  of  the  signed  distances  between  the  location  of  the  item 
and  the  location  of  the  person  along  P ability/difficulty  dimensions.  For  the 
present  analysis,  numerical  values  of  d..  were  assigned  to  each  alternative  on 


^Appreciation  for  the  development  of  this  model  is  expressed  to  Mark  Davison, 
Assistant  Professor  of  Educational  Psychology,  University  of  Minnesota. 
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the  rating  scale.  The  values  assigned  to  alternatives  A through  E were  -2, 

-1,  0,  +1,  and  +2,  respectively.  Thus,  d..  increased  as  the  perceived  difficulty 

of  an  item  increased,  and  d.  . was  equal  to  zero  when  an  item  was  perceived  by 

a testee  as  "just  about  right  for  [me]." 

The  use  of  a model  such  as  that  in  Equation  1 is  advantageous  for  several 
reasons.  Using  the  difficulty  ratings  alone,  estimates  of  individual  ability 
levels  and  item  difficulties  can  be  derived  on  a common  metric.  In  addition, 
the  general,  multidimensional  form  of  the  model  may  be  particularly  useful  in 
describing  difficulty  perceptions  on  multi-ability  test  batteries  or  other 
such  multi-trait  instruments. 


Note  that  P in  the  model  corresponds  to  the  number  of  dimensions  in  the 
space.  If  the  item  difficulty  ratings  are  unidimensional,  P will  equal  1 and 
d^  . can  be  expressed  more  simply  as 


d.  . = w .(X  .-X .) . [ 2 ] 

K r J I- 

Further,  if  the  items  are  assigned  unit  weights,  the  expression  in  Equation  2 
becomes 


d.  .=  (x  .-X .) . 


[3] 


If  the  model  and  the  assumption  of  unidimensionality  are  appropriate  and 
the  average  ability  level  within  a group  of  testees  is  arbitrarily  set  at  zero, 
a least  squares  estimate  of  a single  item's  difficulty  (x  is  found  to  be 


i 


t=i 


[4] 


where  N is  the  number  of  persons  rating  the  item.  Thus,  an  estimate  of  an 
item's  difficulty  is  simply  the  average  difficulty  rating  assigned  to  that 
item  by  the  individual  being  tested. 


Similarly,  a least  squares  estimate  of  j:.,  the  ability  level  of  person  i,  is 


2 2 "" 
X.  = - ~ Z d.  . + - Z 

" "j=2  ^j=2 


X . 


[5] 


where  n is  the  number  of  items  adminstered.  An  estimate  of  an  Individual's 
ability  level  is  thus  the  average  difficulty  rating  he/she  assigns  to  a set  of 
items  plus  the  average  item-difficulty  in  that  set. 


Accuracy  of  ratings-based  estimates.  The  estimates  of  item  difficulties 
and  individual  ability  levels  described  by  Equations  4 and  5 are  based  solely 
on  the  testees'  ratings  of  relative  item  difficulties.  In  order  to  determine 
the  appropriateness  or  accuracy  of  these  perceptions,  the  ratings-based  esti- 
mates of  item  difficulties  and  students'  abilities  were  compared  to  more  conven- 
tional estimates  based  on  the  correctness/incorrectness  of  the  testees'  conven- 
tional responses  to  the  test  items. 
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The  rat ings-based  estimates  of  item  difficulty  were  correlated  with  the 
proportion  of  persons  in  the  present  study  identifying  the  correct  response 
alternative  and  also  with  the  normal-ogive  estimates  of  item  difficulty  {h  .) 

based  on  the  item-calibration  described  in  Appendix  A.  The  ratings-based 
estimates  of  student  ability  were  correlated  with  traditional  number-correct 
scores  and  maximum-likelihood  ability  estimates  (Betz  & Weiss,  1976a) 
based  on  the  normal-ogive  parameters  of  the  items. 


i 


I 

U . 


Dimensionality  of  difficulty  perceptions.  In  order  to  use  the  simple,  uni- 
dimensional form  of  the  difficulty-perception  model  described  above,  the  uni- 
dimensionality of  the  difficulty  ratings  must  be  demonstrated.  Because  there  is 
no  definitive  test  of  unidimensionality,  an  indirect  evaluation  was  necessary. 

McBride  and  Weiss  (1974)  suggested  four  criteria  which,  if  met,  constitute  sufficient 
evidence  of  unidimensionality  in  item-response  data.  According  to  the  criteria 
suggested,  confirmatory  evidence  of  unidimensionality  is  present  when:  2)  the  first 
common  factor  of  the  matrix  of  inter-item  correlations  is  a general  factor  account- 
ing for  a large  proportion  of  the  common  variance  and  on  which  all  variables  load 
highly;  2)  the  second  and  subsequent  factors  account  for  much  smaller  and 
essentially  equal  proportions  of  the  common  variance;  3)  the  item  loadings  on 
the  first  factor  are  either  all  positive  or  all  negative;  and  4)  none  of  the  above 
criteria  are  satisfied  by  the  analysis  of  a similar  correlation  matrix  constructed 
from  computer-generated  random  data.  Although  these  criteria  were  suggested  in 
the  context  of  the  analysis  of  item-response  data,  they  are  equally  applicable 
to  the  analysis  of  the  difficulty  ratings. 

Accordingly,  a 41x41  matrix  of  product-moment  inter-item  correlations 
among  the  difficulty  ratings  was  factor  analyzed  for  each  ability  group. 

Communalltles  for  each  item  were  estimated  by  the  squared  multiple  correlation 
of  that  item  with  all  others  in  the  matrix.  Factors  were  extracted  by  the 
principal  axes  procedure  and  the  resulting  communalltles  were  substituted 
for  the  prior  communality  estimates.  This  procedure  continued  in  an  Iterative 
fashion  until  the  differences  between  the  two  communality  estimates  were 
negl iglble. 

Results 


Dimensionality  of  difficulty  perceptions.  Evidence  of  the  dimensionality 
of  the  difficulty  ratings  is  shown  in  Figures  la  and  lb.  These  figures  show 
the  first  ten  eigenvalues  of  the  inter-item  correlation  matrix  based  on  the 
difficulty  ratings  for  the  low-  and  high-ability  groups,  respectively.  In  both 
figures,  the  eigenvalues  from  the  analysis  of  the  ratings  are  represented  by  a 
solid  line,  while  the  dashed  line  shows  those  resulting  from  an  analysis  of 
comparable,  computer-generated  random  data. 

In  both  ability  groups,  the  first  factor  of  the  real  data  extracted  by 
far  the  largest  amount  of  variance,  while  the  second  factor  extracted  only 
slightly  more  variance  than  did  subsequent  factors.  The  first  factors  extracted 
from  the  random  data,  on  the  other  hand,  accounted  for  little  more  variance 
than  other  random-data  factors.  The  amount  of  variance  extracted  by  the  second 
and  subsequent  factors  in  the  real  data  was  similar  to  that  extracted  by  the 
second  and  subsequent  factors  in  the  random  data. 


Contribution 


Table  1 lists  the  loadings  of  the  items  from  each  test  on  the  first  three 
factors  extracted  from  the  matrix  of  inter-item  correlations  of  difficulty 
ratings  for  that  test.  Each  of  the  items  loaded  positively  on  the  first 
factor  from  that  test's  data,  and  the  first  factor  loadings  were  generally  high. 
These  data  therefore  suggest  the  existence  of  a "general"  factor.  Also  shown 
in  Table  1 are  the  loadings  for  the  first  three  factors  from  the  comparable 
random  data  for  each  group.  For  these  latter  data,  the  first  factor  was 
bipolar  for  both  groups;  i.e.,  positive  and  negative  loadings  occurred  as 
frequently  on  the  first  factor  as  on  Factors  2 and  3.  In  the  real  data,  such 
bipolarity  occurred  only  on  the  second  and  subsequent  factors.  These  results 
therefore  suggest  that  for  both  ability  groups,  the  difficulty  ratings  may 
be  characterized  as  being  unidimensional. 


Accuracy  of  ratings-based  estimates.  Because  the  difficulty  perceptions 
appeared  to  be  unidimensional,  the  difficulty  ratings  were  used  in  conjunction 
with  Equations  4 and  5 to  calculate  ratings-based  estimates  of  item  difficulty 

(.X .)  and  testee  ability  (x .) . The  estimates  of  item  difficulties,  based  solely 
J 


on  the  difficulty  ratings,  are  shown  in  Table  2.  Table  ? alsc  shows  proportion 
correct  (p  .)  and  normal-ogive  (b  .)  item-difficulty  estimates  for  each  item. 

tj  tJ 


In  the  low-ability  group,  estimates  of  item  difficulty  derived  from  the 
difficulty  perceptions  were  highly  related  to  proportion-correct  and  normal- 
ogive  item-difficulty  estimates;  Pearson  product-moment  correlations  were 
p=-.86  and  r>=.80,  respectively.  The  relationships  between  the  ratings-based 
difficulty  estimates  and  the  estimates  based  on  conventional  responses  to  the 
items  were  similarly  high  for  items  in  the  high-ability  group,  with  respective 
Pearson  product-moment  correlations  of  r=-.94  and  r=.85. 

Appendix  Table  B-2  shows,  for  each  testee,  number-correct  scores  (n  .) 

and  maximum  likelihood  estimates  of  the  testee's  ability  level  (6^)  based  on 

his/her  conventional  responses  to  the  items  and  the  corresponding  ability 
estimates  based  on  the  difficulty  perceptions  ■ The  Pearson  product-moment 

correlations  of  the  ratings-based  ability  estimates  with  the  corresponding 
number-correct  scores  and  with  maximum  likelihood  ability  estimates  were 
r=.55  and  r=.56,  respectively,  for  testees  in  the  low-ability  group.  For  persons 
in  the  high-ability  group,  comparable  correlations  were  r=.63  and  .^=.59, 
respectively. 


Difficulty  Perceptions  of  Individual  Items 

The  second  phase  of  the  analysis  assessed  the  relationship  between  the 
ability  levels  of  testees  and  the  perceived  difficulty  of  a given  item.  As  an 
individual's  ability  level  Increases  relative  to  the  difficulty  level  of  an 
item,  the  item  should  be  perceived  by  the  individual  as  being  relatively  less 
difficult.  As  student  ability  levels  decrease  in  comparison  to  an  item's 
difficulty,  the  item  should  appear  to  the  testees  as  being  relatively  more 
difficult.  Thus,  the  difficulty  rating  assigned  by  a testee  to  an  individual 
item  should  be  dependent  upon  the  discrepancy  between  the  testee's  ability 
level  and  the  item's  difficulty. 
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Table  2 

Least-Squares  Item  Difficulty  Estimates  Based  on  the 

Difficulty  Perceptions  (x.)  and  Corresponding  Proportion-Correct  (p.) 

J J 


_ 

and 

Normal-Ogive  (b  .) 

tJ 

Item  Difficulty 

Indices 

Low-Ability  Group 

High-Ability  Group 

\ 

Item 

Reference 

Number 

X . 

J 

Pj 

b. 

3 

Item 

Reference 

Number 

X . 

J 

i . 

'7 

■ > 

2 

-.58 

.99 

-3.81 

2 

-.89 

.97 

-3.81 

♦ * 

4 

-.79 

.99 

-5.56 

7 

-1.11 

.98 

-2.32 

7 

-.96 

.97 

-2.32 

14 

-1.10 

.96 

-2.46 

14 

-.80 

.97 

-2.46 

18 

-.60 

.94 

-4.24 

18 

-.59 

.94 

-4.24 

19 

-.95 

.91 

-3.81 

\ 

19 

-.83 

.91 

-3.81 

23 

-.97 

.99 

-3.86 

20 

-1.51 

.96 

-5.76 

24 

-1.15 

.99 

-2.37 

• 

23 

-.58 

.89 

-3.86 

39 

-.45 

.90 

-3.63 

24 

-.83 

.99 

-2.37 

44 

-.32 

.88 

-1.41 

1 

29 

-.59 

.96 

-5.52 

51 

-.09 

.75 

-1.04 

f 

41 

-.71 

.89 

-6.45 

56 

.39 

.47 

. 13 

1 

44 

.06 

.76 

-1.41 

64 

-1.29 

.99 

-2.36 

51 

. 18 

.57 

-1.04 

68 

-.26 

.98 

-2.48 

55 

-.65 

.94 

-4.95 

77 

-.75 

.94 

-3.60 

56 

.57 

.32 

. 13 

86 

-.24 

.82 

-1.  19 

62 

-.94 

.99 

-4.95 

91 

-.17 

.66 

-.20 

64 

-.97 

.97 

-2.36 

104 

.72 

.47 

.05 

68 

-.68 

.92 

-2.48 

108 

-.36 

.75 

-1.16 

72 

-.72 

.97 

-6. 13 

111 

.72 

.34 

.94 

77 

-.29 

.79 

-3.60 

114 

.80 

.28 

.96 

1 

78 

-.55 

.92 

-4.84 

115 

1.23 

. 16 

2.02 

86 

.21 

.59 

-1.19 

120 

-.35 

.37 

1.46 

89 

.01 

.78 

-2.49 

137 

1.  10 

.48 

-.06 

i 

91 

.08 

.56 

-.20 

145 

.40 

.48 

.09 

w 

k 

108 

-.20 

.57 

-1.16 

147 

. 17 

.30 

1.47 

»-  . 

1 1 1 

.74 

.19 

.94 

154 

.07 

.59 

-.12 

n 

114 

.83 

.16 

.96 

162 

1.09 

.21 

1.24 

^ » 
n 

141 

-.05 

.61 

-1.21 

167 

.67 

.41 

2.16 

145 

. 16 

.47 

.09 

174 

.84 

.30 

1.45 

■•V* 

>* 

154 

.22 

.42 

-.12 

182 

-1.01 

.99 

-3.83 

162 

1.12 

.11 

1.24 

188 

.91 

.47 

-.04 

Ml 

174 

.84 

. 18 

1.45 

191 

-.30 

.89 

-1.26 

182 

-.71 

.97 

-3.83 

217 

.97 

.28 

1.38 

188 

1.09 

.31 

-.04 

253 

1.06 

.29 

1.44 

191 

-.16 

.76 

-1.26 

302 

.90 

.51 

.85 

i' 

192 

.36 

.89 

-6.52 

319 

1.09 

.21 

2.  14 

c 

198 

-.51 

.94 

-2.50 

337 

.61 

.42 

1.18 

• 

302 

.93 

.58 

.85 

359 

.59 

. 16 

2.07 

!*• 

337 

.76 

.41 

1.18 

375 

1.36 

.31 

.93 

375 

1.34 

.22 

.93 

383 

.94 

.34 

1.52  j 

651 

.84 

.31 

.89 

514 

.63 

.43 

1.74  I 
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Table  3 

Correlations  of  Difficulty  Ratings 
with  Ability-1, evel/Item-Difficulty  Discrepancy 


(f) 

and  Dichotomized 

Item  Scores 

(fj, . ) 

h%8 

Low-Ab i 1 ity 

' Croup 

High 

-Ability  Group 

Item  Reference 

r* 

■ 

Dt  S 

Item  Reference  r 

hxs 

Number 

Number 

2 

-.39 

-.19 

2 

-.31 

-1.00 

4 

-.27 

-.67 

7 

-.44 

-.58 

7 

-.31 

-.30 

14 

-.33 

-.36 

14 

-.27 

-.28 

18 

-.21 

-.60 

18 

-.34 

-.24 

19 

-.28 

-.88 

19 

-.26 

-.78 

23 

-.38 

-.67 

20 

-.28 

-.57 

24 

-.22 

-.07 

23 

-.37 

-.58 

39 

-.30 

-.73 

24 

-.27 

-.30 

44 

-.25 

-.34 

29 

-.40 

-1.00 

51 

-.39 

-.55 

41 

-.34 

-.10 

56 

-.38 

-.  75 

44 

-.49 

-.51 

64 

-.27 

.07 

51 

-.49 

-.69 

68 

-.21 

.20 

55 

-.30 

-.30 

77 

-.36 

-.56 

56 

-.40 

-.67 

86 

-.49 

-.66 

62 

-.26 

-.75 

91 

-.44 

-.63 

64 

-.25 

-.15 

104 

-.44 

-.69 

68 

-.17 

.20 

108 

-.41 

-.49 

72 

-.24 

-.73 

111 

-.38 

-.47 

77 

-.  39 

-.47 

114 

-.42 

-.41 

78 

-.56 

-.05 

115 

-.29 

-.56 

86 

-.56 

-.66 

120 

-.31 

-.33 

89 

-.49 

-.85 

137 

-.28 

-.61 

91 

-.34 

-.23 

145 

-.41 

-.48 

108 

-.43 

-.40 

147 

-.13 

-.22 

111 

-.43 

-.32 

154 

-.33 

-.38 

114 

-.43 

-.47 

162 

-.49 

-.72 

141 

-.41 

-.48 

167 

-.23 

-.33 

145 

-.37 

-.16 

174 

-.18 

-.18 

154 

-.51 

-.62 

182 

-.30 

.22 

162 

-.21 

-.23 

188 

-.48 

-.65 

174 

-.23 

-.22 

191 

-.41 

-.60 

182 

-.27 

-.30 

217 

-.23 

-.39 

188 

-.28 

-.50 

253 

.11 

-.20 

191 

-.44 

-.52 

302 

-.  31 

-.43 

192 

-.40 

-.25 

319 

-.41 

-.61 

178 

-.35 

-.76 

337 

-.39 

-.49 

302 

-.03 

-.  37 

359 

-.01 

.14 

337 

-.19 

-.44 

375 

-.15 

- 33 

375 

-.10 

-.06 

383 

-.  36 

-.40 

651 

-.18 

-.30 

514 

-.50 

-.45 
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Method  of  Analyst s 

The  normal-ogive  testing  model  permits  the  estimation  of  individual  ability 

levels  and  item  difficulty  levels  on  a common  metric.  Thus,  an  estimate  of  the 

discrepancy  between  an  individual's  ability  level  and  an  item's  difficulty  is 

Q--b-,  where  0.  represents  the  ability  level  of  person  t,  and  b.  represents  the 
z j z ^ J 

difficulty  of  item  J. 

To  assess  the  relationship  between  the  abll ity-level/item-d if f iculty 

discrepancy  (0.-2j.)  and  the  testee's  difficulty  perception  for  a single  item 
Z i]  ^ 

(.d . , the  Pearson  product -moment  correlation  (p)  between  G .-b  . and  d..  was 

zj  ^ z j 

computed  for  each  item.  Because  the  estimate  of  0.  and  the  estimate  of  h. 

z J 

are  fallible  and  because  it  is  possible  that  testees'  perceptions  are  more 

directly  related  to  whether  or  not  they  can  answer  the  item  correctly  than  to 

0.-5-,  the  biserlal  correlation  (f,  . ) between  the  testees'  item  scores 
t J bzs 

{0  if  incorrect,  2 if  correct)  and  their  difficulty  perceptions  was  also  computed. 
Results 


Table  3 shows  the  correlations  of  the  G^-b.  discrepancy  and  the  difficulty 
ratings,  d-.,  for  items  on  both  tests.  The  median  correlations  were  -.34  for 

z,l 


the  low-ability  group  and  -.33  for  the  high-ability  group.  Correlations 
ranged  from  -.56  to  -.03  for  the  low-ability  group  and  from  -.50  to  -.11  for 
the  high-ability  group. 


Table  3 also  shows  the  biserial  correlations  of  the  item  scores  and  the 
difficulty  ratings  for  each  test  item.  The  median  biserial  correlations  were 
-.40  and  -.48  for  the  low-  and  high-ability  groups,  respectively.  These 
correlations  ranged  from  -1.00  to  .20  for  the  low-ahility  group  and  from  -1.00 
to  .22  for  the  high-ability  group. 


Perceptions  of  Appropriate  Item  Difficulty 


Adaptive  testing  procedures  generally  tailor  a test  such  that  item  diffi- 
culty parameters  are  somewhat  near  the  estimated  ability  level  for  a given 

testee,  i.e.,  so  that  Q ,-b  . approaches  zero.  Although  these  items  may  be 

Z ,7 

"about  right"  in  difficulty  from  a psychometric  standpoint,  they  may  not  be 
"about  right"  from  the  individual  testee's  point  of  view.  The  third  phase  of 
the  analysis  was  designed  to  determine  the  testee-abillty/item-dif flculty 
discrepancy  for  an  item  which  was  perceived  by  the  testee  as  being  nboiit 

pzaht"  for  him/her. 


Method  of  Analysis 


For  each  test  item,  an  average  6 --h  • was  computed 

z J 

the  item  rating  of  "C",  Indicating  that  they  perceived 
item  as  ''.just  about  right  " for  them. 


for  those  persons  giving 
the  dlfficultv  of  the 
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A. 

Tal'li?  H allows  Llio  avoragL-  J i repaiK'y  of  subjiits  assigning  "l~"'  to  the  item 

lot  I u'h  111  thi  Items  >m  the  two  tests.  It  is  obvious  i nin.  the  li.tta  in  Table  4 that 
tiu  " .'.br’tf  i‘Ljh‘"  pona.'pt  ions  ditfer  gfe.ttly  from  item  to  item. 

Positive  values  of  these  mean  discrepaiu  ies  indiiaie  chat  an  item  was 
perceived  as  "ubout  i-u/n'"  wtien  the  difticuliv  Jevel  of  tiie  item  (I  .)  was,  on 

on  the  aver.ige,  below  t lie  testees'  estimated  ability  level  For  the  low- 

.tbility  group,  28  of  the  41  items  had  positive  me.ui  discrepancies;  tliese 
li  i screpanc  i es  ranged  i i uru  . (4  to  5.77.  For  the  ii  i gh-ab  i I i ty  group,  20  of  the 
41  items  liad  [■iisitive  mean  d isc  repan.- ies , tanging  from  .14  to  4.04. 

Negative  values  indicate  a iuJgment  of  "about  right"  for  items  which,  are 
ab.'Ve  a testee's  ability  Level.  •■or  the  low  ability  grcnp,  these  ranged  from 
-.31  to  -2.04.  For  the  high-ability  g.rouo,  the  range  was  -.06  to  -2.44. 

The  average  signed  mean  d i screpan,-y  Vva.s  t.JbS  tor  the  low-abilltv  testees 
and  .28yy  ror  the  high-ability  testees.  These  average.s  are  s.imewhat  ambiguous 
because  differing  numbers  of  testees  rout  i ibutt  il  to  tlie  .'amputation  of  means 
f.’i  individual  items.  the  over-'.l!  ii'e.in  .iiscrepan.  ies  judged  to  be  " :bou' 

’ weighted  by  the  number  of  per.sons  u|)on  v.'hi.h  e.ich  item  mean  w.ts  based, 

were  I.  /Oi  and  .4o6  for  the  low--  an.i  hi  gh- ab  i I it cr.nip.,,  respectively. 


11  i scuss  ij_m 

least  squares  estimates  of  item  d i t f ten ] !.  i es  , based  on  the  ditficulty 
i.itings  as-  igned  to  Itie  items  and  the  nnid  imeii.slona  i .i  i '' f i i nl  f y-percept  ion 
.iio.iel,  were  closely  related  to  difficulty  indice.s  b.u-e  1 .'n  conven  t iona  1 
responses  to  the  items.  Thus,  students  were  abi.-  to  a.'curately  perceive 
tlie  relative  difiiculties  of  a set  or  test  iteiri.s.  Tliere  was  some  suggestion 
in  the  data  that  high-abi  1 i t;  testees  perceived  ii.-  -i  difficulties  relatively 
iiioie  a.curately  chan  did  low-.ibillty  testees. 

SimLl.ir  1 V,  rat  tugs- based  abilit,  estimates  corresponded  rehttively  well 
i.itli  mure  tradii.ional  ability  estiniales.  Because  these  ra  t i ngs-hased  ability 
e.st  iwate.s  were  e.sseruially  .in  average  'f  j he  diffl.'ultv  ratings  .assigned  to  the 
items,  the  po.sttive  corre  1 at  ii  iis  betw-.ou  the.se  estimates  and,  for  instame, 

Ltie  number-correct  scores  i,’di,  n.-  ' li.it  .> -•  .tbilitv  lev. -Is  incre.i^ed , the  items 
acTc  rated  a.s  being  relaiiv<-|-.  ies.-.  ditficiilt,  on  the  .tverage. 

The  c.jr  ce  I .U  i ons  iitivee.  'he  ra  t i ngs- based  ability  estimates  and  the  number- 
..irreci  scores  .t  1 so  indie. ue  i h.u  testees  can,  with  a fair  degree  of  accuracy, 
poicelve  how  well  they  nave  performed  on  an  ability  test.  Tlie  correlations  of 
.55  tor  tlie  Inw-abiliry  group  .sugge.st  o rlial  studenr.s  in  this  group  were  slightly 
less  able  to  peiceive  their  ability  levels  as  assessed  by  number-correct  scores 
than  were  testees  in  the  liigti-.ibi  1 i I gr.uip,  where  mimber-correc t scores  and 
rat  ings-based  abilit,'  esLi.n.t:,  oiel.'.ied  .ti3.  In  general,  however,  the 
magnitude  of  the  re  1 at  i .nn  h i pv  i,, .tween  tb<-  .li'ti.-ultv  ratiivgs  and  objective 


1 


Table  4 

Mean  Signed  Discrepancy  by  Item  Between  Testee  Ability  and 

Item  Difficulty  (O.-L.)  for  Students  Rating 
I' 

an  Item  "Just  About  Right  for  [me],"  for  Two  Ability  Groups 


Low-Ability  Group 

High-Ability  Group 

Item 

Reference 

Number 

Mean 

Discrepancy 

Number  of 
Students 

Item 

Reference 

Number 

Mean 

Discrepancy 

Number  of 
Students 

2 

2.87 

50 

2 

3.38 

60 

i 

4 

4.63 

48 

7 

1.52 

47 

> 

7 

1.24 

36 

14 

1.68 

51 

14 

1.47 

46 

18 

4.04 

58 

18 

3.37 

53 

19 

3.29 

39 

19 

2.73 

42 

23 

3.16 

61 

20 

4.03 

8 

24 

1.85 

43 

23 

2.97 

54 

39 

3.29 

76 

24 

1.44 

46 

44 

1.15 

101 

i 

29 

4.54 

50 

51 

.79 

90 

• 

41 

5.54 

49 

56 

-.06 

59 

44 

.75 

52 

64 

1.77 

34 

1 

51 

.36 

49 

68 

2.01 

82 

f 

\ 

L 

55 

3.94 

60 

77 

2.96 

76 

56 

-.75 

35 

86 

.77 

60 

1 

62 

4.00 

33 

91 

-.29 

73 

64 

1.37 

39 

104 

.14 

32 

68 

1.46 

53 

108 

.85 

78 

72 

5.13 

42 

111 

-.88 

48 

77 

2.66 

60 

114 

-.87 

48 

78 

3.88 

62 

115 

-1.85 

11 

86 

.61 

37 

120 

-1.92 

88 

4 

89 

1.69 

51 

137 

.42 

31 

91 

-.82 

53 

145 

-.26 

77 

1 . 

108 

.34 

54 

147 

-1.80 

84 

% ^ 

111 

-1.49 

32 

154 

-.15 

95 

114 

-1.25 

32 

162 

-.75 

26 

f 

ft 

141 

.50 

55 

167 

-2.44 

51 

«•  . 

145 

-.73 

43 

174 

-1.37 

46 

n 

154 

-.59 

63 

182 

3.16 

55 

1 * 

ft 

162 

-1.45 

14 

188 

.29 

32 

174 

-2.04 

26 

191 

.94 

73 

182 

2.90 

61 

217 

-1.31 

38 

188 

-.31 

14 

253 

-1.99 

27 

191 

.46 

49 

302 

-.96 

40 

192 

5.77 

47 

319 

-1.59 

29 

198 

1.59 

57 

337 

-1.29 

63 

302 

-1.22 

20 

359 

-2.35 

42 

337 

-1.66 

35 

375 

-.62 

15 

t 

375 

-1.37 

11 

383 

-1.20 

49 

651 

-1.59 

29 

514 

-1.64 

56 
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estimates  of  item  difficulty  and  between  the  ratings  and  estimates  of  testees' 
abilities  indicates  that  testee  perceptions  of  test  difficulty  and  their 
test  performance  are,  at  least  generally,  accurate. 


The  second  phase  of  the  analysis  showed  that  for  an  individual  item, 
however,  there  was  relatively  little  relationship  between  testee  perceptions 
of  item  difficulty  and  testee-ablllty/item-dlf f iculty  discrepancies  or  the  item 
scores.  The  median  proportions  of  variance  accounted  for  by  the  linear  rela- 
tionship between  the  Q ,-b  . discrepancy  and  the  difficulty  perceptions  (r*^) 

3 

were  only  .12  and  .11  for  the  two  ability  groups.  The  median  proportions  of 
variance  accounted  for  by  the  relationship  between  th^  dichotomized  item  scores 


and  the  difficulty  perceptions 


bis 


were  .16  and  .23  for  the  two  groups. 


In  these  latter  data,  however,  there  again  seems  to  be  a difference  in  favor  of 
the  high-ability  group  in  that  their  difficulty  perceptions  were  more  highly 
related  to  their  test  behavior. 


The  finding  most  relevant  for  the  design  of  ability-testing  procedures  was 
that  items  which  were  judged  by  the  testees  to  be  "about  right"  in  difficulty 
were  not  necessarily  "about  right"  from  a psychometric  point  of  view.  These  data, 
in  fact,  show  that  testees  perceived  items  that  were  somewhat  below  their  ability 
levels  as  being,  on  the  average,  about  right  for  persons  of  their  ability  level. 

In  the  case  of  the  low-ability  students,  the  items  perceived  as  appropriate  had, 
on  the  average,  normal-ogive  difficulty  parameters  which  were  over  1.5  standard 
deviations  below  the  testees'  maximum  likelihood  ability  estimates.  The  high- 
ability  students  judged  items  as  "about  right"  if,  on  the  average,  they  were 
about  one-half  standard  deviation  below  their  ability  levels.  Low-ability 
students  tended  to  judge  items  as  "about  right"  in  difficulty  when  the  items 
were  below  their  ability  levels;  the  high-ability  students  divided  their  "about 
right"  judgements  equally  between  items  which  were  psychometrically  too  easy  and 
those  which  were  psychometrically  too  difficult. 

Conclusions 


These  data  show  that  students'  perceptions  of  the  relative  difficulties  of 
a set  of  ability  test  items  are  quite  accurate,  but  that  their  perceptions  of 
the  difficulties  of  individual  ability-test  items  are  only  moderately  accurate. 
The  data  alao  suggest  that  the  ability  level  of  the  testee  has  some  effect  on 
difficulty  perceptions.  Ability  level  also  is  related  to  the  accuracy  of 
perception  of  a testee's  own  test  score.  Thus,  testees  of  different  ability 
levels  seem  to  encounter  a different  psychological  environment  when  interacting 
with  an  ability  test.  This  conclusion  is  further  supported  by  the  students' 
perceptions  of  the  items  which  are  "about  right"  for  their  ability  levels. 

The  psychometric  and  the  psychological  effects  of  adapting  an  ability  test 
to  a level  where  the  testee  perceives  the  test  difficulty  as  "about  right" 
should  be  studied.  Adaptive  testing  strategies  usually  tailor  a test  such  that 
the  estimated  difficulty  of  each  item  administered  is  close  to  the  current 
estimate  of  an  Individual's  ability  level.  In  adapting  a test  to  ensure  that 
item  difficulties  are  psychometrically  optimal,  these  strategies  may  also,  in 
effect,  be  tailoring  the  test  so  that  all  of  the  items  are  perceived  by  testees 
as  being  too  difficult  for  persons  of  their  ability  level.  The  psychological 
effects  of  such  a procedure  should  be  investigated  more  fully. 
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APPENDIX  A 


Item  Calibration  Procedures 
Initial  Item  Parameter  Estimates 

The  item  parameterization  procedures  that  were  used  assumed  a normal-ogive 
latent  trait  model  and  the  existence  of  a bivariate-normal  joint-distribution 
of  9 (levels  of  the  latent  ability)  and  x (the  continuous  variable  assumed  to 
underlie  the  dichotomous  item  responses).  Given  these  assumptions,  discrimina- 
tion (a)  and  difficulty  (b)  parameters  may  be  defined  by  Equations  6 and  7, 


a . 


[6] 


b . 

J 


[7] 


where  is  the  correlation  between  individuals'  ability  levels  (0)  and  their 

j 

scores  (x)  on  item  j. 


Y . is  the  a-score  above  which  lies  the  proportion  of  testees  in  the  pop- 

V 

ulatlon  knowing  the  correct  answer  to  item  j (Lord  & Novick,  1968). 


In  order  to  estimate  Pq  the  blserlal  correlation  (f  .)  between  testees' 

ability  levels  and  their  dichotomized  item  scores  was  found  by  first  estimating 

the  point-biser ial  correlation  (r* .)  between  ability  levels  and  dichotomous  item 

J 

scores  by  Equation  8,  based  on  data  reported  by  McBride  and  Weiss  (1974), 


= (x  - X ) \/Tp-)  0-p  .)/s 

+ - .7  .7  3 


where  x^  is  the  mean  number-correct  score  of  persons  correctly  answering  item  j, 

X is  the  mean  number-correct  score  of  persons  incorrectly  answering  item  j, 
p.  is  the  proportion  of  persons  correctly  answering  item  j, 
s is  the  standard  deviation  of  number-correct  scores  for  the  total 
group  answering  item  j. 


The  biserial  coefficient  was  then  computed  using  the  transformation  in  Equation  9, 
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where  z . is  tlie  3-score 
iiorming  sample 


above  which  lies  the  proportion  of 

correctly  answering  item  (p  .)  , 

,1 


testees  in  the 


is  the  density  of 


normal  probability  density 


function  at  3.. 

.7 


Because  a testee  could  answer  an  Item  correctly  simply  by  random  guessing 
on  these  5-alternative,  multiple-choice  items,  a guessing  parameter  (e)  was 
defined  for  each  item  by  Equation  10. 

c.=  l/n.  [10] 

J t/ 

where  n.  is  the  number  of  response  alternatives  on  item  j. 


In  order  to  account  for  guessing  when  the  initial  a and  h parameters  used 
to  construct  the  tests  described  in  this  report  were  derived,  the  estimate  of 
p.  (e.)  computed  in  Equation  9 was  modified  according  to  Equation  11, 

0 

v'.=r  ./U-o  .) . [11] 

J J J 


The  estimate  of  p 


resulting  from  Equation  11  (r')  was  restricted  to 

tJ 


the  interval  from  -1.0  to  +1.0  and  used,  along  with  3.  (as  an  estimate  of  Y .) , 

J P 

to  calculate  values  of  a and  h for  each  item  using  Equations  6 and  7.  The 

resulting  values  of  a.  were  then  restricted  to  the  interval  from  -3.0  to  +3.0. 

3 

The  restrictions  on  r'.  and  a.  thus  affected  both  the  values  of  the  a and  b 
parameters  but  the  effects  of  the  restrictions  were  not  necessarily  consistent. 


Revised  Item  Parameter  Estimates 


The  item  parameter  estimates  derived  from  the  above  procedures  were  used 
to  select  items  for  the  tests  administered  in  this  study.  In  the  time  interval 
between  the  construction  of  the  tests  and  tlie  analysis  of  the  data,  it  became 
apparent  that  certain  revisions  to  these  item  parameter  estimates  were  necessary 
for  each  item.  These  revised  estimates  were  computed  for  all  569  items  in  the 
pool  from  which  items  for  this  study  were  selected. 


In  computing  the  revised  estimates  of  a and  b used  to  analyze  the  present 
data,  the  proportion  of  testees  who  actually  knew  the  correct  answer  to  an  item 
ip'.)  was  estimated  from  the  proportion  of  testees  in  the  population  who  actually 

tJ 

answered  the  item  correctly  (p .)  and  the  estimate  of  using  Equation  12, 

3 3 


(p  .-c’  .). 

*J  tJ  d 


[12] 
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S 

j 

t 


An  estimate  of 


i> . Vfp  J 

(1-j  .) 

J 


p suggested  by  Urry  (1975)  was  then  computed  by  Equation  13, 
(1-p.) 

^ [13] 

«/ 


where  3 'I  is  the  s-score  above  which  lies  the  proportion  of  testees  in  the  sample 
who  were  estimated  to  t ** 'cz?  Z;-'  know  tlie  answer  to  item  j (pT), 

<p[z']  is  the  density  of  normal  probability  density  function  at  z'.. 

J J 

This  estimate  of  pg^,  was  then  used,  along  with  p'.  as  an  estimate  of  , 

“ J 

in  Equations  6 and  7 to  calculate  the  revised  a and  b parameters.  If  p .<a 

3 t) 

p'.  was  set  equal  to  .001.  If  | „?•'[ | >. 9486833 , r'.  was  set  equal  to  .9486833 

3 ^3  3 J 

with  the  appropriate  sign.  This  restricted  the  a-values  to  the  interval  from 
-3.0  to  +3.0  and  influenced  the  fe-values  through  Equation  7. 


This  latter  procedure^  differs  from  that  suggested  by  Jensema  (1976)  only 
in  that  Jensema  chose  to  remove  each  item  from  the  computation  of  the  test 
score  estimating  0 during  the  computation  of  that  Item's  parameters.  For  test 
scores  based  on  ]arge  numbers  of  items,  the  effects  of  this  exclusion  should  be 
negligible. 


Comparison  of  Original  and  Revised  Item  Parameters 

For  items  in  the  pool  with  b parameters  between  ?3.0,  Figure  A-1  presents 
the  bivariate  plot  of  the  original  and  the  revised  b parameters.  As  Figure  A-1 
shows,  the  revised  b estimates  were  closely  related  to  the  original  fc-values 
(Pearson  product-moment  J'=.98).  The  bivariate  plot  of  original  and  revised 
a-values  is  shown  in  Figure  A-2 . As  this  figure  shows,  the  revised  a-values 
were  not  as  closely  related  to  the  original  a-values  (Pearson  product-moment 
r>=.74)  as  were  the  revised  fc-values. 


To  determine  the  effects  of  the  revised  item  parameters  on  ability  estimates 
computed  using  those  parameters,  maximum  likelihood  ability  estimates  were 
computed  using  both  sets  of  item  parameters  for  the  185  CT.A  students  involved 
in  this  study.  The  bivariate  plot  of  the  two  sets  of  maximum  likelihood  ability 
estimates  is  shown  in  Figure  A-3.  The  resulting  Pearson  product-moment  corre- 
lation of  .96  indicated  that  the  ability  estimates  did  not  differ  greatly  depending 
on  whether  the  original  or  revised  normal-ogive  item-parameter  estimates  were 
used.  This  high  correlation  suggests  that  essentially  the  same  conclusions 
would  be  drawn  in  this  study  from  the  use  of  either  the  original  set  of  item 
parameters  or  the  revised  set  of  parameter  estimates  based  on  Ilrry's  (1975) 
correction  procedure. 


^These  procedures  were  suggested  by  James  B.  Sympson  of  the  University  of 
Minnesota. 


Figure  A-1 

Joint  Distribution  of  Original  and  Revised  Difficulty 
Parameter  (b)  Estimates 
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Figure  A- 3 

Joint  Distribution  of  Maximum-likelihood  Ability  Estimates  (0) 
Based  on  the  Original  and  the  Revised  Item-parameter  Estimates 
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APPENDIX  B 


Tabic  B-1 

Order  of  Administration  and  Normal  Ogive  Di- criminat ion  (a)  and 
Difficulty  (M  Parameters  for  Items  on  Tests  for  the  Low-  and  High-Ability  Groups 


Low-Ability  Group High-Ability  Group 


Item  Reference 
Number 

Item  Sequence 

Item  Parameters 

Item  Reference 
Number 

Item  Sequence 

Item  Parameters 

A 

B 

c 

D 

j 

A 

B 

c 

D 

a 

2 

11 

32 

37 

16 

.517 

-3.810 

; 

41 

7 

27 

21 

.517 

-3.810 

4 

24 

24 

10 

38 

.397 

-5.561 

7 

39 

8 

26 

22 

3.000 

-2.324 

7 

3 

3 

3 

3 

3.000 

-2.324 

14 

22 

26 

8 

40 

2.208 

-2.46! 

14 

40 

9 

25 

23 

2.208 

-2.461 

18 

1 

1 

1 

1 

.483 

-4.241 

18 

41 

7 

27 

21 

.483 

-4.241 

19 

28 

20 

14 

34 

.710 

-3.808 

19 

16 

37 

32 

1 1 

.710 

-3.808 

23 

18 

39 

30 

9 

.713 

-3.862 

20 

1 

1 

1 

1 

.381 

-5.764 

24 

30 

IP, 

16 

32 

1.749 

-2.366 

23 

22 

26 

8 

40 

.713 

-3.862 

39 

5 

5 

5 

5 

.347 

-3.625 

24 

13 

34 

35 

14 

1.749 

-2.366 

44 

32 

16 

18 

30 

1 . 145 

-1.412 

29 

25 

23 

11 

37 

.323 

-5.521 

51 

27 

21 

13 

35 

1.432 

-1.043 

41 

7 

28 

41 

20 

.272 

-6.450 

56 

34 

14 

20 

28 

1 . 109 

.135 

44 

15 

36 

33 

12 

1 . 145 

-1.412 

64 

23 

25 

9 

3° 

3.000 

-2.363 

51 

34 

14 

20 

28 

1.432 

-1.043 

68 

15 

36 

33 

12 

1.014 

-2.479 

55 

29 

19 

15 

33 

.288 

-4.953 

77 

10 

31 

38 

17 

.442 

-3.602 

56 

17 

38 

31 

10 

1.109 

.135 

86 

7 

28 

41 

20 

.887 

-1.189 

62 

18 

39 

30 

9 

.426 

-4.952 
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